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In this paper we describe a trace analysis framework, from trace generation to visualization. It 
includes a unified tracing facility on IBM SP systems, a self-defining interval file format, an API for 
framework extensions, utilities for merging and statistics generation, and a visualization tool with 
preview and multiple time-space diagrams. The trace environment is extremely scalable, and 
combines MP! events with system activities in the same set of trace files, one for each SMP node. 
Sin ... 
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2 Techniques for efficient inline tracing on a shared-memory multiprocessor 
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While much current research concerns multiprocessor design, few traces of parallel programs are 
available for analyzing the effect of design trade-offs. Existing trace collection methods have 
serious drawbacks: trap-driven methods often slow down program execution by more than 1000 
times, significantly perturbing program behavior; microcode modification is faster, but the 
technique is neither general nor portable. This paper describes a new tool, called MPTRACE, for 
collecting tr ... 



3 Designing a trace format for heap allocation events 
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Dynamic storage allocation continues to play an important role in the performance and correctness 
of systems ranging from user productivity software to high-performance servers. While algorithms 
for dynamic storage allocation have been studied for decades, much of the literature is based on 
measuring the performance of benchmark programs unrepresentative of many important 
allocation-intensive workloads. Furthermore, to date no standard has emerged or been proposed 
for publishing and exchangin ... 
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2 Hot cold optimization of large Windows/NT applications 
Robert Cohn, P. Geoffrey Lowney 

December 1996 Proceedings of the 29th annual ACM/IEEE international symposium on 

Microarchitecture MICRO 29 
Publisher: IEEE Computer Society 

Full text available: ^pdf(l.l4 MB) Additional Information: full citation , abstract , references , citings , index terms 

A dynamic instruction trace often contains many unnecessary instructions that are required only 
by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique that 
realizes this performance opportunity. HCO uses profile information to partition each routine Into 
frequently executed (hot) and infrequently executed (cold) parts. Unnecessary operations in the 
hot portion are removed, and compensation code is added on transitions from hot to cold as 
needed. We evaluate HCO on a ... 
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Automatic formal verification of DSP software 
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This paper describes a novel formal verification approach for equivalence checking of small, 
assembly-language routines for digital signal processors (DSP). By combining control-flow 
analysis, symbolic simulation, automatic decision procedures, and some domain-specific 
optimizations, we have built an automatic verification tool that compares structurally similar DSP 
assembly language routines. We tested our tool on code samples taken from a real application 
program and discovered several pr ... 

4 Practical experiences: Security-driven exploration of cryptography in DSP cores 
Catherine H. Gebotys 

>^ October 2002 Proceedings of the 15th international symposium on System Synthesis ISSS 
02 

Publisher: ACM Press 

Full text available: ^pdf(1.04 MB) Additional Information: full citation , abstract , references , index terms 

With the popularity of wireless communication devices a new important dimension of embedded 
systems design has arisen, that of security. This paper presents for the first time design 
exploration for secure implementation of cryptographic applications on a complex DSP processor 
core. A new metric for security, the implementation security index, is introduced for measuring 
resistance to power attacks. Elliptic curve cryptographic algorithms are used to demonstrate and 
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Adding trace matching with free variables to AspectJ 
.a^ Chris Allan, Pavel Avgustlnov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Ondrej 
Lhot^k, Oege de Moor, Damlen Sereni, Ganesh Sittampalam, Julian Tibbie 
October 2005 ACM SIGPLAN Notices , Proceedings of the 20th annual ACM SIGPLAN 

conference on Object oriented programming, systems, languages, and 

applications OOPSLA '05, Volume 40 Issue 10 
Publisher: ACM Press 

Full text available: ^pdf(392.31 KB) Additional Information: fuil citation , abstract , references , citings , index terms 

An aspect observes the execution of a base program; when certain actions occur, the aspect runs 
some extra code of its own. In the Aspect! language, the observations that an aspect can make 
are confined to the current action: it is not possible to directly observe the history of a 
computation. Recently, there have been several interesting proposals for new history-based 
language features, most notably by Douence et al. and by Walker and Viggers. In this paper, we 
present a ne ... 
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2 A dynamic multithreading processor 
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3 Instruction fetch and control flow: Power-efficient instruction delivery through trace reuse 
4^ Chengmo Yang, Alex Orailoglu 

^ September 2006 Proceedings of the 15th international conference on Parallel architectures 
and compilation techniques PACT '06 
Publisher: ACM Press 

Full text available: ^ pdf(260.98 KB) Additional Information: fu i l c it ation , abstract , references, index terms 

As power dissipation inexorably becomes the major bottleneck in system integration and 
reliability, the front-end instruction delivery path in a traditional out-of-order superscalar 
processor needs to deliver high application performance in an energy-effective manner. This 
challenge can be addressed by efficiently reusing the work of fetch and decode performed during 
preceding loop iterations and resident mostly within the processor itself. As a large percentage of 
the instructions currently und ... 
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Parallel Java environments present challenging problems for performance tools because of Java's 
rich language system and its multi-level execution platform combined with the integration of 
native-code application libraries and parallel runtime software. In addition to the desire to provide 



I of 5 



5/30/2007 9:23 PM 



